Efficient High-Dimensional Kernel k-Means++ with Random Projection

نویسندگان

چکیده

Using random projection, a method to speed up both kernel k-means and centroid initialization with k-means++ is proposed. We approximate the matrix distances in lower-dimensional space Rd before clustering motivated by upper error bounds. With projections, previous work on bounds for dot products an improved bound methods are considered k-means. The complexities Lloyd’s algorithm known be O(nkD) Θ(nkD), respectively, n being number of data points, dimensionality input feature vectors D clusters k. proposed reduces computational complexity computation from O(n2D) O(n2d) subsequent O(nkd). Our experiments demonstrate that speed-up reduced d=200 2 26 times very little performance degradation (less than one percent) general.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Kernel Clustering: Approximate Kernel k-means

Kernel-based clustering algorithms have the ability to capture the non-linear structure in real world data. Among various kernel-based clustering algorithms, kernel k -means has gained popularity due to its simple iterative nature and ease of implementation. However, its run-time complexity and memory footprint increase quadratically in terms of the size of the data set, and hence, large data s...

متن کامل

Two-dimensional random projection

As an alternative to adaptive nonlinear schemes for dimensionality reduction, linear random projection has recently proved to be a reliable means for high-dimensional data processing. Widespread application of conventional random projection in the context of image analysis is, however, mainly impeded by excessive computational and memory requirements. In this paper, a two-dimensional random pro...

متن کامل

Kernel Penalized K-means: A feature selection method based on Kernel K-means

Article history: Received 11 June 2014 Received in revised form 23 October 2014 Accepted 11 June 2015 Available online 19 June 2015

متن کامل

Random Projection for Fast and Efficient Multivariate Correlation Analysis of High-Dimensional Data: A New Approach

In recent years, the advent of great technological advances has produced a wealth of very high-dimensional data, and combining high-dimensional information from multiple sources is becoming increasingly important in an extending range of scientific disciplines. Partial Least Squares Correlation (PLSC) is a frequently used method for multivariate multimodal data integration. It is, however, comp...

متن کامل

Almost Random Projection Machine with Margin Maximization and Kernel Features

Almost Random Projection Machine (aRPM) is based on generation and filtering of useful features by linear projections in the original feature space and in various kernel spaces. Projections may be either random or guided by some heuristics, in both cases followed by estimation of relevance of each generated feature. Final results are in the simplest case obtained using simple voting, but linear...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2021

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app11156963